Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

client, server systemd units: make Restart=always truly respected #184

Merged
merged 2 commits into from
Jan 11, 2022

Conversation

strohel
Copy link
Member

@strohel strohel commented Jan 6, 2022

Surprisingly, Restart=always may not always restart the unit if it restarts too fast.

Set a combination of options which should make systemd truly restart innernet always.
See https://unix.stackexchange.com/q/289629/352972.

The RestartSec=60 is the main and important one which would prevent systemd from ever failing
to restart innernet in the default settings (because with it it would never exceed the default
limit of 5 restarts in 10 seconds).

StartLimitIntervalSec=0 option is a complementary one for explicitly disabling the logic, and
may be removed from this PR if deemed unnecessary.

Should fix tonarino/portal#1441 (link to issue in private repository).


I've tested this successfully with the client unit file (on an artificial problem though):

led 06 17:55:07 mat480s systemd[1]: Started innernet client daemon for tonari.
led 06 17:55:07 mat480s systemd[693653]: innernet@tonari.service: Failed to locate executable /usr/bin/innernet: No such file or directory
led 06 17:55:07 mat480s systemd[693653]: innernet@tonari.service: Failed at step EXEC spawning /usr/bin/innernet: No such file or directory
led 06 17:55:07 mat480s systemd[1]: innernet@tonari.service: Main process exited, code=exited, status=203/EXEC
led 06 17:55:07 mat480s systemd[1]: innernet@tonari.service: Failed with result 'exit-code'.


led 06 17:56:07 mat480s systemd[1]: innernet@tonari.service: Scheduled restart job, restart counter is at 1.
led 06 17:56:07 mat480s systemd[1]: Stopped innernet client daemon for tonari.
led 06 17:56:07 mat480s systemd[1]: Started innernet client daemon for tonari.
led 06 17:56:07 mat480s systemd[693719]: innernet@tonari.service: Failed to locate executable /usr/bin/innernet: No such file or directory
led 06 17:56:07 mat480s systemd[693719]: innernet@tonari.service: Failed at step EXEC spawning /usr/bin/innernet: No such file or directory
led 06 17:56:07 mat480s systemd[1]: innernet@tonari.service: Main process exited, code=exited, status=203/EXEC
led 06 17:56:07 mat480s systemd[1]: innernet@tonari.service: Failed with result 'exit-code'.

led 06 17:57:07 mat480s systemd[1]: innernet@tonari.service: Scheduled restart job, restart counter is at 2.
led 06 17:57:07 mat480s systemd[1]: Stopped innernet client daemon for tonari.
led 06 17:57:07 mat480s systemd[1]: Started innernet client daemon for tonari.
led 06 17:57:07 mat480s systemd[693733]: innernet@tonari.service: Failed to locate executable /usr/bin/innernet: No such file or directory
led 06 17:57:07 mat480s systemd[693733]: innernet@tonari.service: Failed at step EXEC spawning /usr/bin/innernet: No such file or directory
led 06 17:57:07 mat480s systemd[1]: innernet@tonari.service: Main process exited, code=exited, status=203/EXEC
led 06 17:57:07 mat480s systemd[1]: innernet@tonari.service: Failed with result 'exit-code'.


led 06 17:58:08 mat480s systemd[1]: innernet@tonari.service: Scheduled restart job, restart counter is at 3.
led 06 17:58:08 mat480s systemd[1]: Stopped innernet client daemon for tonari.
led 06 17:58:08 mat480s systemd[1]: Started innernet client daemon for tonari.
led 06 17:58:08 mat480s systemd[693820]: innernet@tonari.service: Failed to locate executable /usr/bin/innernet: No such file or directory
led 06 17:58:08 mat480s systemd[693820]: innernet@tonari.service: Failed at step EXEC spawning /usr/bin/innernet: No such file or directory
led 06 17:58:08 mat480s systemd[1]: innernet@tonari.service: Main process exited, code=exited, status=203/EXEC
led 06 17:58:08 mat480s systemd[1]: innernet@tonari.service: Failed with result 'exit-code'.
led 06 17:59:08 mat480s systemd[1]: innernet@tonari.service: Scheduled restart job, restart counter is at 4.
led 06 17:59:08 mat480s systemd[1]: Stopped innernet client daemon for tonari.
led 06 17:59:08 mat480s systemd[1]: Started innernet client daemon for tonari.
led 06 17:59:08 mat480s systemd[693865]: innernet@tonari.service: Failed to locate executable /usr/bin/innernet: No such file or directory
led 06 17:59:08 mat480s systemd[693865]: innernet@tonari.service: Failed at step EXEC spawning /usr/bin/innernet: No such file or directory
led 06 17:59:08 mat480s systemd[1]: innernet@tonari.service: Main process exited, code=exited, status=203/EXEC
led 06 17:59:08 mat480s systemd[1]: innernet@tonari.service: Failed with result 'exit-code'.


led 06 18:00:08 mat480s systemd[1]: innernet@tonari.service: Scheduled restart job, restart counter is at 5.
led 06 18:00:08 mat480s systemd[1]: Stopped innernet client daemon for tonari.
led 06 18:00:08 mat480s systemd[1]: Started innernet client daemon for tonari.
led 06 18:00:08 mat480s systemd[1]: innernet@tonari.service: Main process exited, code=exited, status=203/EXEC
led 06 18:00:08 mat480s systemd[1]: innernet@tonari.service: Failed with result 'exit-code'.



led 06 18:01:08 mat480s systemd[1]: innernet@tonari.service: Scheduled restart job, restart counter is at 6.
led 06 18:01:08 mat480s systemd[1]: Stopped innernet client daemon for tonari.
led 06 18:01:08 mat480s systemd[1]: Started innernet client daemon for tonari.
led 06 18:01:08 mat480s systemd[693907]: innernet@tonari.service: Failed to locate executable /usr/bin/innernet: No such file or directory
led 06 18:01:08 mat480s systemd[693907]: innernet@tonari.service: Failed at step EXEC spawning /usr/bin/innernet: No such file or directory
led 06 18:01:08 mat480s systemd[1]: innernet@tonari.service: Main process exited, code=exited, status=203/EXEC
led 06 18:01:08 mat480s systemd[1]: innernet@tonari.service: Failed with result 'exit-code'.


led 06 18:02:09 mat480s systemd[1]: innernet@tonari.service: Scheduled restart job, restart counter is at 7.
led 06 18:02:09 mat480s systemd[1]: Stopped innernet client daemon for tonari.
led 06 18:02:09 mat480s systemd[1]: Started innernet client daemon for tonari.
led 06 18:02:09 mat480s systemd[693935]: innernet@tonari.service: Failed to locate executable /usr/bin/innernet: No such file or directory
led 06 18:02:09 mat480s systemd[693935]: innernet@tonari.service: Failed at step EXEC spawning /usr/bin/innernet: No such file or directory
led 06 18:02:09 mat480s systemd[1]: innernet@tonari.service: Main process exited, code=exited, status=203/EXEC
led 06 18:02:09 mat480s systemd[1]: innernet@tonari.service: Failed with result 'exit-code'.

led 06 18:03:09 mat480s systemd[1]: innernet@tonari.service: Scheduled restart job, restart counter is at 8.
led 06 18:03:09 mat480s systemd[1]: Stopped innernet client daemon for tonari.
led 06 18:03:09 mat480s systemd[1]: Started innernet client daemon for tonari.
led 06 18:03:09 mat480s systemd[694037]: innernet@tonari.service: Failed to locate executable /usr/bin/innernet: No such file or directory
led 06 18:03:09 mat480s systemd[694037]: innernet@tonari.service: Failed at step EXEC spawning /usr/bin/innernet: No such file or directory
led 06 18:03:09 mat480s systemd[1]: innernet@tonari.service: Main process exited, code=exited, status=203/EXEC
led 06 18:03:09 mat480s systemd[1]: innernet@tonari.service: Failed with result 'exit-code'.

Surprisingly, Restart=always may not _always_ restart the unit if it restarts too fast.

Set a combination of options which should make systemd truly restart innernet always.
See https://unix.stackexchange.com/q/289629/352972.

The `RestartSec=60` is the main and important one which would prevent systemd from ever failing
to restart innernet in the default settings (because with it it would never exceed the default
limit of 5 restarts in 10 seconds).

`StartLimitIntervalSec=0` option is a complementary one for explicitly disabling the logic, and
may be removed from this PR if deemed unnecessary.

Should fix tonarino/portal#1441 (link to issue in private repository).

[Service]
Type=simple
Environment="RUST_LOG=info"
ExecStart=/usr/bin/innernet-server serve %i
Restart=always
# When the daemon exits, wait this amount of secs before restarting. Prevents innernet from
# start-looping each 100ms for example when there is a problem reaching the server.
RestartSec=60
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For client, the restart interval of 60 secs is somewhat natural as it is equal to the fetch interval.

But I don't have an idea what the server restart interval should be. Perhaps a bit less?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think it should be closer to 1s than 60s.

Copy link
Collaborator

@mcginty mcginty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for opening this! Make sense especially to have the client try harder to keep the interface updated :).

@@ -2,12 +2,18 @@
Description=innernet server for %I
After=network-online.target nss-lookup.target
Wants=network-online.target nss-lookup.target
# Disable systemd's unit start rate limiting logic, which could override Restart=always.
# See https://unix.stackexchange.com/q/289629/352972
StartLimitIntervalSec=0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not so sure we want to modify this unit much at all - if the server fails it's more likely to be a permanent problem and the typical systemd "give up" logic makes more sense, in my opinion.

Personally, I'd suggest we should only slightly increase the RestartSec value and leave it otherwise as-is.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not so sure we want to modify this unit much at all - if the server fails it's more likely to be a permanent problem and the typical systemd "give up" logic makes more sense, in my opinion.

That's a good point. I've made changes to systemd units more conservative in a fixup commit to resolve this and the 2 other comments.


[Service]
Type=simple
ExecStart=/usr/bin/innernet up %i --daemon --interval 60
Restart=always
# When the daemon exits, wait this amount of secs before restarting. Prevents innernet from
# start-looping each 100ms for example when there is a problem reaching the server.
RestartSec=60
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's ok with you, let's make this 10 rather than 60 since it may be a temporary issue and we should be retrying harder if a failure has caused innernet to not fetch peers and update the interface.

@strohel strohel requested a review from mcginty January 11, 2022 16:29
@mcginty mcginty merged commit 1b26082 into main Jan 11, 2022
@mcginty mcginty deleted the systemd-unit-really-restart-always branch January 11, 2022 19:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants